Treatment effect modification due to comorbidity: Individual participant data meta-analyses of 120 randomised controlled trials

Background People with comorbidities are underrepresented in clinical trials. Empirical estimates of treatment effect modification by comorbidity are lacking, leading to uncertainty in treatment recommendations. We aimed to produce estimates of treatment effect modification by comorbidity using individual participant data (IPD). Methods and findings We obtained IPD for 120 industry-sponsored phase 3/4 trials across 22 index conditions (n = 128,331). Trials had to be registered between 1990 and 2017 and have recruited ≥300 people. Included trials were multicentre and international. For each index condition, we analysed the outcome most frequently reported in the included trials. We performed a two-stage IPD meta-analysis to estimate modification of treatment effect by comorbidity. First, for each trial, we modelled the interaction between comorbidity and treatment arm adjusted for age and sex. Second, for each treatment within each index condition, we meta-analysed the comorbidity–treatment interaction terms from each trial. We estimated the effect of comorbidity measured in 3 ways: (i) the number of comorbidities (in addition to the index condition); (ii) presence or absence of the 6 commonest comorbid diseases for each index condition; and (iii) using continuous markers of underlying conditions (e.g., estimated glomerular filtration rate (eGFR)). Treatment effects were modelled on the usual scale for the type of outcome (absolute scale for numerical outcomes, relative scale for binary outcomes). Mean age in the trials ranged from 37.1 (allergic rhinitis trials) to 73.0 (dementia trials) and percentage of male participants range from 4.4% (osteoporosis trials) to 100% (benign prostatic hypertrophy trials). The percentage of participants with 3 or more comorbidities ranged from 2.3% (allergic rhinitis trials) to 57% (systemic lupus erythematosus trials). We found no evidence of modification of treatment efficacy by comorbidity, for any of the 3 measures of comorbidity. This was the case for 20 conditions for which the outcome variable was continuous (e.g., change in glycosylated haemoglobin in diabetes) and for 3 conditions in which the outcomes were discrete events (e.g., number of headaches in migraine). Although all were null, estimates of treatment effect modification were more precise in some cases (e.g., sodium-glucose co-transporter-2 (SGLT2) inhibitors for type 2 diabetes—interaction term for comorbidity count 0.004, 95% CI −0.01 to 0.02) while for others credible intervals were wide (e.g., corticosteroids for asthma—interaction term −0.22, 95% CI −1.07 to 0.54). The main limitation is that these trials were not designed or powered to assess variation in treatment effect by comorbidity, and relatively few trial participants had >3 comorbidities. Conclusions Assessments of treatment effect modification rarely consider comorbidity. Our findings demonstrate that for trials included in this analysis, there was no empirical evidence of treatment effect modification by comorbidity. The standard assumption used in evidence syntheses is that efficacy is constant across subgroups, although this is often criticised. Our findings suggest that for modest levels of comorbidities, this assumption is reasonable. Thus, trial efficacy findings can be combined with data on natural history and competing risks to assess the likely overall benefit of treatments in the context of comorbidity.


Methods and findings
We obtained IPD for 120 industry-sponsored phase 3/4 trials across 22 index conditions (n = 128,331). Trials had to be registered between 1990 and 2017 and have recruited �300 people. Included trials were multicentre and international. For each index condition, we analysed the outcome most frequently reported in the included trials. We performed a two-stage IPD meta-analysis to estimate modification of treatment effect by comorbidity. First, for each trial, we modelled the interaction between comorbidity and treatment arm adjusted for age and sex. Second, for each treatment within each index condition, we meta-analysed the comorbidity-treatment interaction terms from each trial. We estimated the effect of comorbidity measured in 3 ways: (i) the number of comorbidities (in addition to the index condition); (ii) presence or absence of the 6 commonest comorbid diseases for each index condition; and (iii) using continuous markers of underlying conditions (e.g., estimated glomerular filtration rate (eGFR)). Treatment effects were modelled on the usual scale for the type of outcome (absolute scale for numerical outcomes, relative scale for binary outcomes). Mean age in the trials ranged from 37.1 (allergic rhinitis trials) to 73.0 (dementia trials) and percentage of male participants range from 4.4% (osteoporosis trials) to 100% a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 (benign prostatic hypertrophy trials). The percentage of participants with 3 or more comorbidities ranged from 2.3% (allergic rhinitis trials) to 57% (systemic lupus erythematosus trials). We found no evidence of modification of treatment efficacy by comorbidity, for any of the 3 measures of comorbidity. This was the case for 20 conditions for which the outcome variable was continuous (e.g., change in glycosylated haemoglobin in diabetes) and for 3 conditions in which the outcomes were discrete events (e.g., number of headaches in migraine). Although all were null, estimates of treatment effect modification were more precise in some cases (e.g., sodium-glucose co-transporter-2 (SGLT2) inhibitors for type 2 diabetes-interaction term for comorbidity count 0.004, 95% CI −0.01 to 0.02) while for others credible intervals were wide (e.g., corticosteroids for asthma-interaction term −0.22, 95% CI −1.07 to 0.54). The main limitation is that these trials were not designed or powered to assess variation in treatment effect by comorbidity, and relatively few trial participants had >3 comorbidities.

Conclusions
Assessments of treatment effect modification rarely consider comorbidity. Our findings demonstrate that for trials included in this analysis, there was no empirical evidence of treatment effect modification by comorbidity. The standard assumption used in evidence syntheses is that efficacy is constant across subgroups, although this is often criticised. Our findings suggest that for modest levels of comorbidities, this assumption is reasonable. Thus, trial efficacy findings can be combined with data on natural history and competing risks to assess the likely overall benefit of treatments in the context of comorbidity.

Why was this study done?
• There is often uncertainty about how treatments for single conditions should be applied to people with 2 or more long-term conditions (multimorbidity).
• People with multimorbidity are underrepresented in randomised controlled trials (RCTs); however, trials rarely report whether the efficacy of treatment differs by the number of additional long-term conditions (comorbidities) or in the presence of specific comorbidities.

What did the researchers do and find?
• We analysed individual-participant data from 120 RCTs including 128,331 participants across 23 index conditions to assess whether the efficacy of treatment differed depending on the number of comorbidities or in the presence of any of the most common comorbidities.
• We found no evidence that treatment efficacy differed depending on the number of comorbidities, or by any specific comorbidities, for any of the index conditions and treatment comparisons included in this analysis.

Introduction
Multimorbidity, the presence of 2 or more long-term conditions, is a global clinical and public health priority [1,2]. Most people with a given long-term condition also have comorbidities (referring to additional long-term conditions in the context of an index condition). There is uncertainty about how individual long-term conditions should be managed in the presence of comorbidities [3]. A major driver of this uncertainty is the underrepresentation of people with multimorbidity in randomised controlled trials (RCTs) [4,5]. Trial populations are typically younger, healthier, and have fewer comorbidities than people treated in routine clinical practice. This has led clinical guideline developers to caution against the application of single-disease recommendations for people with multimorbidity [6]. However, despite the challenges to clinical management posed by this uncertainty, the efficacy of treatments in the context of comorbidity is rarely assessed. It is therefore not clear, for most treatments, whether relative treatment efficacy differs in people with comorbidity. Assessing individual differences in response to medical treatments is a controversial topic. Differences in treatment efficacy are typically assessed using subgroup analyses. Subgroup analyses in RCTs seek to assess if treatment efficacy differs by patient characteristics [7]. Testing of prespecified subgroup effects is common practice in RCTs of medical therapies [8,9]. As such, subgroup analyses seek to inform stratified approaches to patient care by identifying groups for whom recommendations may be tailored [10]. However, trials rarely report subgroup analyses by levels of comorbidity or for specific comorbidities. Furthermore, subgroup analyses are inconsistently executed and reported, as well as suffering a number of well-documented statistical pitfalls [7,11], notably that analysis of subgroups risks false positive and false negative findings [11]. RCTs are generally not powered to detect subgroup effects, and as such, the sample size in subgroup analyses is frequently insufficient to detect clinically significant differences in treatment efficacy even if these were to exist [12]. Conversely, by testing multiple subgroups, the likelihood of chance findings (i.e., false positives) is increased [7,12]. The limitations of trial-level subgroup analyses can be reduced using meta-analyses. However, when considering whether treatment efficacy varies by comorbidity, traditional studylevel meta-analysis of published findings are likely to be inadequate as trials rarely report subgroup effects by comorbidity, and those that do may be subject to publication bias. In such circumstances, any assessment of the effect of comorbidity is therefore based on between-trial comparisons that are prone to bias [13]. Individual-participant data meta-analysis has the potential to overcome these problems. We previously demonstrated, using data from >100 industry-sponsored clinical trials, that it was possible to identify comorbidities in most trials and that multimorbidity was common (although underrepresented) in trial populations [4,14,15]. Furthermore, in a recent simulation study, we demonstrated that combining trials on all comparisons for a given indication in Bayesian hierarchical models has several desirable properties in terms of estimating treatment effect modification by comorbidity [16]. First, precision is higher compared to single-comparison meta-analyses, increasing the likelihood of detecting small (but clinically relevant) subgroup effects where these are present. Secondly, extreme values are attenuated towards the null (shrinkage), reducing the risk of false positive findings [16]. Bayesian hierarchical models may therefore be a useful tool to assess treatment efficacy estimates in the context of multimorbidity.
This study aims to assess whether treatment effects are modified in the presence of comorbidity, by using individual participant data (IPD) from 120 trials to assess whether treatment efficacy for 23 index conditions differs by (i) number of additional long-term conditions (comorbidity count); (ii) the 6 commonest comorbidities for each index condition; (iii) by continuous biomarkers associated with comorbidity.

Study design
For trials of 23 index conditions, we identified comorbid long-term conditions using IPD for each trial. We then summarised these as a comorbidity count (in addition to the index condition) for each participant. Further, we identified the 6 commonest comorbidities for each index condition across trials and defined a presence/absence variable for each. We estimated differences in treatment efficacy by fitting regression models to IPD for each trial to obtain trial-level estimates of covariate-treatment interaction effects. We fit models for age and sex alone, for a comorbidity count, and for each of the 6 commonest comorbidities for each index condition. Trial-level estimates were then meta-analysed to obtain drug and index conditionspecific estimates of treatment effect modification by comorbidity. This process is summarised in Fig 1 and explained in detail below.
All analyses were conducted in R (R Core Team, 2021). Analysis code, metadata (indicating, for example, how treatment arms and outcomes were selected), and data (except trial IPD) are available on the project github repository (https://doi.org/10.5281/zenodo.7713360).

Data sources
Trials were identified according to a prespecified protocol [17]. We focused on trials of pharmacological agents for 23 index conditions (Table 1). Eligibility criteria were RCTs for one of the index conditions; registered with the United States Clinical trials registry (clinicaltrials.gov) on or after January 1990; phase 2/3, 3, or 4; including �300 participants; and with eligibility defined using an upper age limit of 60 years or more or no upper age limit. Smaller studies and studies with lower age limits were excluded as they were considered less likely to include sufficient people with comorbidity. From a list of all registered, eligible trials we then identified trials for which IPD were available from one of 2 repositories: Clinical Study Data Request (CSDR) or the Yale Open Data Access (YODA) repository. These repositories facilitate sharing of industry-sponsored trial data with third-party researchers. The process of trial identification is described in detail elsewhere [4].

Quantifying comorbidity
For each participant with a specified index condition in each of the included trials, we identified comorbidities from a prespecified list of 21 conditions (cardiovascular disease, chronic pain, arthritis, affective disorders, acid-related disorders, asthma/chronic obstructive pulmonary disease (COPD), diabetes mellitus, osteoporosis, thyroid disease, thromboembolic disease, inflammatory conditions, benign prostatic hyperplasia, gout, glaucoma, urinary incontinence, erectile dysfunction, psychotic disorders, epilepsy, migraine, parkinsonism, and dementia) [4]. These comorbidities were based on previous work identifying comorbidities within trial IPD and were based on assessment of medical history and concomitant medication data. In this previous work, we demonstrated that while for many trials medical history had  been redacted, data on concomitant medications were widely available and could be used to define comorbidities [4]. This involved combining some conditions into the same definition (e.g., asthma and COPD, which could not be differentiated based on medication use alone). These definitions were based on the World Health Organisation Anatomic Therapeutic Classification and are described in our previous publication and available on the project github repository [4]. Where medical history data was available and coded using the Medical Dictionary for Regulatory Activities (MedDRA) coding system, we also identified the same conditions using MedDRA codes.

Comorbidity count
For the primary analysis, we created a comorbidity count for each participant. This was the total number of comorbidities present, not including the index condition. This count was used as a numerical variable in all analyses.

Individual comorbidities
For each index condition, we also identified the 6 most common comorbidities from the full list of 21 possible comorbidities. These individual comorbidities were analysed as binary variables (reflecting the presence of absence of that specific comorbidity).

Selected biomarkers/risk factors
In addition to the 21 comorbidities defined using medication and/or medical history, we identified 5 continuous biomarkers that may indicate comorbidity (e.g., renal impairment, hypertension, anaemia, or liver disease) or risk factors (e.g., obesity). These were based on baseline trial measurements: estimated glomerular filtration rate (eGFR, as a marker of renal impairment, taken from trial data where this was available and calculated from creatinine, age, sex, and race using the Modification of Diet in Renal Disease (MDRD) equations if it was not), body mass index (as recorded or calculated based on height and weight), fibrosis-4 (FIB-4) index (as a marker of liver disease calculated from aspartate aminotransferase, alanine transaminase, and platelet counts), haemoglobin, and mid-blood pressure (MBP, defined as 0.5 × (systolic blood pressure + diastolic blood pressure)).

Demographics
Age and sex were extracted from each trial based on the trial recorded values at randomisation.

Treatment arms
Treatment arm comparisons were prespecified prior to undertaking the outcome analyses. For multiarm trials, the most extreme arms were selected for comparison (e.g., if different dosages were used, the highest dose was compared to placebo or usual care-e.g., canagliflozin 300 mg, rather than 100 mg, versus placebo). Where placebo or usual care was included as a trial arm, this was selected as the comparator. Otherwise, we chose the arm with the least recently developed treatment as the comparator arm. This was to give the best chance of identifying effect modification, with the resulting analysis representing an upper limit on the degree of effect modification observed.

Outcomes
We aimed to identify outcomes common across trials to facilitate meta-analysis. We obtained information from clincialtrials.gov via the Database for Aggregated Analysis of ClinicalTrials. gov (AACT; https: //ctti-clinicaltrials.org/citation-policy/) on all outcomes (primary and secondary) for each trial. For each index condition, we then identified 1 or more outcomes that appeared to be common to multiple trials (e.g., forced expiratory volume in 1 s (FEV1) in COPD trials, 6-min walk distance (6MWD) in pulmonary hypertension trials). Within the trial repositories, we then reviewed the trial documentation to identify these outcomes for each trial. For trials of anticoagulants, in addition to the efficacy outcome, we also analysed bleeding events as these are a common and clinically important adverse outcome.

Statistical analyses
In 4 separate analyses we (i) estimated age and sex-treatment interactions without including comorbidity; (ii) estimated comorbidity-treatment interactions for the comorbidity count; (iii) estimated comorbidity-treatment interactions for the 6 commonest comorbidities for each index condition; and (iv) examined covariate-treatment interactions for continuous biomarkers. Full descriptions of the modelling are provided in the Supporting information appendix (S1 File) and are described briefly below.

IPD analysis
For trials where the outcome was a continuous variable, for each trial and analysis the change in each outcome was modelled using linear regression. For analysis (i), the final measure was regressed on the baseline measure, age (modelled as a continuous variable scaled to 15-year increments, which was close to the standard deviation for most trials), sex (male versus referent group of females), arm (binary variable treatment/control), and interactions with arm for each covariate. For American College of Rheumatology-N (ACR-N, a measure of improvement in disease activity in rheumatoid arthritis, which is itself a measure of change), we did not include the baseline measure as a covariate. We then repeated this modelling for the remaining analyses (ii to iv) adding comorbidity covariates in addition to age and sex (comorbidity count, specific comorbidities, and continuous biomarkers for analyses (ii to iv), respectively). From these models, the model coefficients, standard errors, and variance-covariance matrices were obtained and exported from the YODA and CSDR secure analysis platforms. For trials where the outcome was a count or a binary variable, we fitted similar models using Poisson regression and logistic regression, respectively.

Meta-analysis
For the continuous outcomes, in order to convert the measures onto a similar scale, we divided the estimates and standard errors by the minimum clinically important difference (MCID) for that measure. For most outcome measures, higher scores indicate worse outcomes (e.g., Bath Ankylosing Spondylitis Disease Activity Index (BASDI)). Where this was not the case (e.g., FEV1), we multiplied the values by minus one so that the direction of effect was the same for all trials. For the variance-covariance matrix, we divided each element by the MCID-squared. The MCID was selected using the published literature by hand-searching papers in the Core Outcome Measures in Effectiveness Trials (COMET) database for relevant conditions [18]. This search was supplemented by simple internet searches (Google searches using the full and abbreviated names for each outcome and MCID, MID, "minimum clinically important difference," or "minimum important difference"). Where no published MCID recommendations could be found, we used the MCID defined in the power calculations in the trial protocols. At this stage, for each index condition, we restricted the analysis to the single most common outcome across trials. In 2 index conditions (Ankylosing spondylitis and hypertension), 2 outcomes were equally common; BASDAI and Bath Ankylosing Spondylitis Functional Index (BASFI) and diastolic blood pressure and systolic blood pressure, we arbitrarily chose BASDAI in the former case and chose systolic blood pressure in the latter as it is more prominent in clinical decision-making.
For each drug class, the model outputs were then meta-analysed. We used random-effects meta-analyses where 5 or more trials were included within the same drug class, and fixed effects where there were fewer than 5 trials. We used Bayesian models since this allowed us to simultaneously model multiple coefficients (e.g., age-treatment and sex-treatment interactions). The Bayesian models were fit using the brms package [19]. Samples from the posterior distribution were obtained and summarised as the mean and 95% credible intervals (CI). P-values were not presented as this was a Bayesian analysis. The full posteriors are provided in the project repository (doi:10.5281/zenodo.7713360).
In case other researchers wish to use the results of our models of treatment-covariate interactions to inform subsequent analyses as informative priors, we obtained summaries of the posterior predictions. We did so only for analysis (ii) for continuous outcomes. In order to provide a more general set of priors, we also predicted the comorbidity count-treatment interaction for treatment comparisons/conditions not included in our model by obtaining samples from the posteriors. The latter are provided to allow researchers to conduct Bayesian analyses or probabilistic sensitivity analyses if studying conditions/treatment comparisons not included in our modelling as this represents a prediction for an unobserved index condition/treatment comparison (albeit one which is assumed to be exchangeable with the conditions/treatment comparisons included in the current analysis). We then summarised these samples by fitting a Student's t-distribution. As with the main analysis, these models were fitted using the brms package (S1 File).

Ethical approval
This project had approval from the University of Glasgow, College of Medicine, Veterinary and Life Sciences ethics committee (200160070).

Trial characteristics
Trial baseline characteristics have been reported previously [4]. For trials with continuous outcomes, there were 20 index conditions and 47 treatment comparisons across a total of 106 trials (n = 88,150 participants). For 9 index conditions, there was only 1 treatment comparison across all trials. Diabetes, which was the condition for which there were the most trials (22), had the largest number of treatment comparisons (9) ( Table 1). Within each model, all trials had a single common outcome except inflammatory bowel disease, where the ulcerative colitis trials used the MAYO score and Crohn's disease trials used the Crohn's Disease Activity Index score. For trials with categorical outcomes, there were 3 index conditions (migraine, osteoporosis, and thromboembolism) and 11 treatment comparisons across a total of 17 trials (n = 11,624 participants). For thromboembolism, there were 3 more specific categories of indication-primary prevention (5 trials), secondary prevention (2 trials), and treatment (2 trials).

Continuous outcomes-Age-and sex-treatment interactions
For all conditions with continuous outcomes, interaction terms for age-and sex-treatment interactions are shown in Table 2. For most drug classes, interaction terms for age included the null, indicating no statistically significant associations consistent with modification of treatment efficacy by age. However, in the diabetes trials, there appeared to be an attenuation in the treatment effect with increasing age for 3 drug classes (0.07 (95% CI 0.00, 0.13) for sulfonylureas versus SGLT2 inhibitors, 0.09 (95% CI 0.01, 0.17) for DPP-4 inhibitors versus SGLT2 inhibitors, and 0.07 (95% CI 0.04, 0.11) for SGLT2 inhibitors versus placebo). Taking SGLT2 inhibitors versus placebo as an example, this can be read as follows-"the lowering effect on HbA1c of SGLT2 inhibitors versus placebo is 0.28 (95% CI 0.16, 0.44) mmol/mol smaller (since the MCID for HbA1c is 4 mmol/mol) per 15-year increment in age, for age 50 years versus age 80 years, this corresponds to the effect being 0.56 (95% CI 0.32, 0.88) mmol/mol smaller. Similarly, most interaction terms for sex included the null, with a few exceptions ( Table 2). For example, for glucagon-like peptide-1 (GLP-1) analogues, the interaction term for sex was 0.29 (0.12, 0.49) indicating that the lowering effect on HbA1c of GLP-1 analogues is 1.16 (0.48, 1.96) mmol/mol smaller in men than in women.

Continuous outcomes-Comorbidity-treatment interactions
For each drug class, Figs 2 to 6 show the main treatment effect (black points, expressed as change in minimally clinically important difference) and the estimate for the comorbiditytreatment interaction based on a comorbidity count (red points) meta-analysed within treatment indications.  Fig 7. Comorbidity count was not associated with any attenuation or strengthening in treatment efficacy; in all cases, the 95% CIs included the null. This suggests that for all treatments and in all index conditions, it is plausible that there is no difference in treatment effect by comorbidity (on the absolute scale) within the range of comorbidity counts observed in the trials. When examining comorbidity-treatment interactions for the 6 most common comorbidities within each index condition, 95% CIs included the null for all estimates (S1 Table). Similarly, when assessing modification of treatment efficacy by continuous biomarkers, all estimates included the null (S2 Table).
In a sensitivity analysis, rather than using a fixed effects model for meta-analyses where there were fewer than 5 trials, we used a random effects model. The 95% CIs were wider, but the results of these models were otherwise similar to those presented in the main analysis (S3 Table).

Informative priors for subsequent analyses including different index condition/treatment comparisons
On predicting treatment effect modification by comorbidity count for a notional unobserved condition and notional unobserved treatment comparison, the samples from the posterior were approximately t-distributed (central estimate = 0.01, dispersion = 0.01, degrees of freedom = 3.24).

Categorical outcomes-Morbidity count-treatment interactions
For the 3 index conditions with categorical outcomes (Table 1), there was no evidence of any comorbidity count-treatment interactions. These findings are summarised in Table 3.

Sex-treatment interaction
Ankylosing

Discussion
In an IPD meta-analysis of 120 trials, we examined whether the efficacy of drug treatments differed by comorbidity. For 20 index conditions where the outcome variable was continuous (e.g., glycosylated haemoglobin in diabetes trials), efficacy did not differ by the total number of comorbidities or by the presence or absence of specific comorbidities. Similarly, for 3 conditions (17 trials) examining outcomes which were discrete events (e.g., thromboembolism, bleeding, headaches, and fractures), there was no evidence of treatment effect modification by comorbidity count or by specific comorbidities. Several previous studies have reported findings on treatment effect modification in IPD meta-analyses and meta-analyses of reported subgroup effects. However, these have largely been confined to major cardiovascular disease trials (e.g., for showing similar efficacy of statin in people with and without diabetes [20], differential benefit of blood pressure lowering therapy in people with and without diabetes [21], or showing questionable net benefit of aspirin in primary prevention [22]) or to concordant conditions defined as those closely related to the index condition or target event for the trial (such as hypertension in stroke trials [23]). These studies have not considered the impact of comorbidity more broadly or of discordant comorbidities not related to the index condition of the trial. This represents an important omission, because there are a number of mechanisms by which the presence of discordant conditions might plausibly modify treatment efficacy (positively or negatively) including increased diagnostic misclassification, altered pharmacokinetics, or pharmacodynamics (e.g., altered drug excretion in people with mild renal impairment or increased benefits of antiplatelet drugs in the presence of coexistent inflammatory conditions) and altered treatment-related behaviours (e.g., better or worse treatment adherence due to existing treatment regimens). Our study adds to this sparse literature showing that, on average, treatment effects are similar across different populations within trials (at modest comorbidity counts of 3 or fewer). This supports the standard assumption that treatment effects are similar when generalising from trial to non-trial eligible populations, at least for populations with limited prevalence of comorbidity such as in these trials.
Although we found that treatment efficacy did not differ by comorbidity count, net overall treatment benefits may nonetheless differ in people with differing degrees of comorbidity. This is because differences in the baseline risk (e.g., the absolute risk of the outcome that the treatment is intended to prevent), differences in susceptibility to treatment-related adverse events, differences in competing risks (e.g., absolute risk of mortality from other causes), and

Sex-treatment interaction
Pulmonary differences in the burden of treatment (e.g., higher treatment burden in the context of multimorbidity leading to reduced concordance or reduced quality of life) may all lead to differences in the net overall benefit of treatment [24]. Therefore, where comorbidity alters the (baseline) natural history of diseases, the likelihood of adverse treatment effects (e.g., comorbid

Fig 3. Main treatment effect and comorbidity-treatment interactions (diabetes, GORD, hypertension, and IBD):
This plot shows the main treatment effect (black) and the comorbidity-treatment interaction (red) based on a comorbidity count. Trial-level estimates (circles) and metaanalysed estimates (diamonds) are presented along with 95% CIs (whiskers). Details of effect [3]estimates, heterogeneity, and model diagnostics can be found here: https://zenodo.org/badge/latestdoi/611754942. CI, credibility interval; GORD, gastro-oesophageal reflux disease; IBD, inflammatory bowel disease. https://doi.org/10.1371/journal.pmed.1004176.g003 renal impairment) or life expectancy (e.g., via discordant comorbidities associated with mortality), the effects of treatment must differ even assuming that there is no difference in efficacy. For example, while there is strong evidence that the benefits of dual antiplatelet therapy (DAPT) following myocardial infarction (versus a single antiplatelet) outweigh the risks This plot shows the main treatment effect (black) and the comorbidity-treatment interaction (red) based on a comorbidity count. Trial-level estimates (circles) and metaanalysed estimates (diamonds) are presented along with 95% CIs (whiskers). Details of effect estimates, heterogeneity, and model diagnostics can be found here: [4]https://zenodo.org/badge/latestdoi/611754942. CI, credibility interval; IBD, inflammatory bowel disease.
https://doi.org/10.1371/journal.pmed.1004176.g004 [25,26], this may not be true for patients with coexisting COPD. Cardiovascular mortality is commoner in COPD than the general population, favouring DAPT [27]. However, non-cardiovascular mortality is also higher [28], favouring single-antiplatelet therapy because of competing risks. Intensive control of blood glucose and other risk factors in diabetes [29,30] and This plot [5]shows the main treatment effect (black) and the comorbidity-treatment interaction (red) based on a comorbidity count. Trial-level estimates (circles) and meta-analysed estimates (diamonds) are presented along with 95% CIs (whiskers anticoagulant use in atrial fibrillation [31] provide similar examples where the net overall treatment benefits are uncertain for people with comorbidity. This is the first IPD clinical trial meta-analysis, as far as we are aware, to examine whether treatment efficacy differs by comorbidity. Nonetheless, there are several important limitations. First, while for some index conditions (e.g., diabetes) there were many trials, for others there were few trials and so relatively few participants, limiting the precision with which covariatetreatment interactions could be estimated. Furthermore, the individual trials were neither designed not powered to measure comorbidity-treatment interactions. Specifically, the higher levels of comorbidity observed in clinical practice (e.g., 5 or more comorbidities) are rare within the trial participants. This reduces the likelihood of detecting a comorbidity-treatment interaction were one to exist. Therefore, while our results are consistent with there being no comorbidity-treatment interactions, this should be interpreted within the range of comorbidities, index conditions, and treatment comparisons that are presented.
Second, most trials were phase 3 trials focussed on efficacy outcomes (e.g., change in a disease marker such as blood pressure or glycosylated haemoglobin) rather than pragmatic trials focussed on harder outcomes (such as the incidence of specific adverse health outcomes). The findings for the smaller number of trials (17 in total) where we did have harder outcomes (headaches, bleeding, thromboembolism, and fracture) were similar to the findings for the remaining trials; there was no evidence of treatment effect modification by comorbidity count on the conventional scale (additive for continuous outcomes and relative for noncontinuous outcomes). Nonetheless, the small number of trials and indications where hard outcomes were studied means that caution is needed in extrapolating our findings to trials or meta-analyses focussing on such outcomes. Also, for some conditions and indications, the main effects were small or included the null. Where this was the case, the chances of detecting treatment effect modification are lower.
Third, while this analysis assesses treatment efficacy, we did not assess whether comorbidities lead to variation in adverse effects of treatment. An appreciation of both benefits and harms is required in order to inform judgements about the net benefits of treatment in the context of comorbidity. Fourth, while we include a large number of trials across a range of index conditions, this is not a representative sample in terms of the larger body of trials. Specifically, there were all industry-sponsored trials (as the CSDR and YODA repositories only held industry-sponsored trial data for the conditions of interest). Furthermore, not all sponsors share data in this way nor do sponsors share data for all trials conducted. We have previously demonstrated that these trials were similar to the wider body of industry-sponsored trials in terms of characteristics such as size, phase, and significance of the primary outcome [4]. However, it is possible that by selecting only industry-sponsored trials inclusion criteria and selection processes may be more restrictive than for other trials. This means that, while we did not detect any evidence of treatment effect modification by comorbidity, it cannot be assumed to be absent particularly in other trials which may be more pragmatic or have less restrictive selection criteria.
Finally, while comorbidity was present in all the included trials, they remain underrepresentative in terms of the extent comorbidity [4,[32][33][34]. Specifically, there were few people in the included trials with high comorbidity counts (e.g., 4 or more conditions). This highly multimorbid population is not uncommon in routine clinical practice [35] and presents considerable challenges for clinical decision-making [3]. Their exclusion from these trials means that our findings cannot be assumed to be directly transferable to patient groups with the highest degree of multimorbidity, for whom uncertainties over the net benefit of treatments are often greatest [36,37].
Our findings have implications for the conduct of future evidence syntheses. In order to estimate net overall treatment benefits, clinical guidelines and health technology assessments routinely use evidence synthesis [38]. Such approaches combine (i) estimates of relative treatment efficacy with (ii) "natural" history (standard comparator rates) to calculate absolute effectiveness, commonly expressed as the absolute risk reduction (ARR) or number needed to treat [39]. However, hitherto evidence synthesis has rarely been used to estimate net overall treatment effects for people with multimorbidity. This may partly be due to uncertainty as to whether and how efficacy estimates differ in people with and without comorbidities. Since estimating the natural history rates of target and adverse events for people with multimorbidity is relatively straightforward using routine healthcare data (since such data are sufficiently large and rich in people with multimorbidity to produce such estimates), and within the limitations outlined above, our findings support the standard assumption of estimates of treatment efficacy being constant (at least at the modest levels observed within trial populations).
To support such evidence syntheses, we have provided a set of informative priors that can be used to propagate, into the final treatment effectiveness estimates, the additional uncertainty arising from applying estimates from clinical trials to populations rich in multimorbidity. We summarised the variation in treatment effects by comorbidity count as a set of Student's t-distributions. These distributions can be used to inform modelling studies (e.g., health technology assessments) designed to extrapolate treatment effect estimates from trial populations to routine clinical practice where multimorbidity is more common. This has the potential to better inform regulatory bodies and guideline developers as they seek to make treatment recommendations for people with multimorbidity.
Our findings also have relevance for analyses of comorbidity subgroup findings in both single clinical trials and as part of meta-analyses. The lack of information for estimating subgroup effects in clinical trials and dangers of falsely claiming spurious subgroup effects is well established and a range of approaches have been advocated for dealing with this problem. These include limiting the number of subgroups and performing corrections for multiple testing (e.g., the Bonferroni technique used in frequentist analysis), the analysis of treatment effect modification according to participant's prognostic risk scores at baseline (which reduce the dimensionality of the problem and prioritises characteristics known to predict differences in the rates of target events) [40] and in a Bayesian context the use of subject-matter expert knowledge (via prior elicitation). The prior distributions derived from our modelling for the comorbidity-treatment interactions can help inform such prior-elicitation exercises. Another technique used in Bayesian subgroup analyses is to use off-the-shelf conservative priors designed to avoid over fitting [41]; our findings will help provide reassurance that such priors are unlikely to be overly conservative for modelling comorbidity-treatment interactions.
Finally, our results have relevance for reporting of clinical trial results. Both comorbidity and frailty can be measured using data already collected from clinical trials and-as we show -it is feasible to estimate comorbidity-treatment interactions using such measures. In our project, this required access to IPD a process which is expensive (in terms of analysis time) and complex (requiring formal contractual agreements). The PATH statement advocated that clinical trials should report treatment effect modification by baseline prognostic risk score [40]. We agree that this is a useful approach because it reduces the complex problem of subgroup analysis into a single measure (reducing overfitting), and because, by definition, it targets variables which most strongly predict the risk of target events. This latter aspect is important as it helps inform evidence synthesis models applying trial results to a target population with a higher target event rate. For similar reasons, we propose that trials should also report evidence of treatment effect modification by comorbidity or degree of frailty; this would reduce the risk of overfitting by reducing comorbidity to a single variable that predicts rates of competing events. To inform judgements about net benefits, this same information should be provided for adverse events. In addition, more research is required to establish whether specific comorbidities may attenuate or strengthen treatment efficacy, as if these effects were in the opposite direction for different comorbidities, then a cumulative count of comorbidities may obscure this effect.
In conclusion, we found no evidence that treatment efficacy differed by comorbidity within the levels of comorbidity observed within clinical trial populations. This finding held whether comorbidity was measured using a simple condition count or by the presence or absence of 6 common conditions. Nonetheless, comorbidity is underrepresented in trials, especially at higher levels often seen in clinical practice, and in these contexts, the applicability of trial effect estimates needs to be carefully considered. The analysis of these trials may be used to inform subsequent evidence syntheses, analysis and reporting of individual trials, meta-analyses, and health economic models. We provide model outputs in the form of prior distributions to support such analyses.
Supporting information S1 File. Statistical methods. This file contains a more detailed description of the statistical analysis and model specifications. (DOCX) S1 Table. Comorbidity-treatment interactions for the 6 most common comorbidities within each index condition. This table shows the coefficients and 95% CIs for the comorbidity-treatment interaction for each of the 6 most common comorbidities within each index condition. Each comorbidity was modelled separately. (XLSX) S2 Table. Comorbidity-treatment interactions for continuous biomarkers. This table shows the coefficients and 95% CIs for the comorbidity-treatment interaction continuous biomarkers associated with comorbidity. Biomarkers assessed were estimated glomerular filtration rate (eGFR, as a marker of renal impairment, taken from trial data where this was available and calculated from creatinine, age, sex, and race using the MDRD equations if it was not), body mass index (as recorded or calculated based on height and weight), fibrosis-4 (FIB-4) index (as a marker of liver disease calculated from aspartate aminotransferase, alanine transaminase, and platelet counts), haemoglobin, and MBP (defined as 0.5 × (systolic blood pressure + diastolic blood pressure)). (XLSX) S3 Table. Random effects meta-analyses for indications with <4 trials. This table presents the results for meta-analyses of less than 5 trials using a random effects meta-analysis presenting alongside the fixed-effects findings from the main manuscript. (XLSX)